Analysis of Syntax-Based Pronoun Resolution Methods
نویسنده
چکیده
This paper presents a pronoun resolution algor i thm that adheres to the constraints and rules of Centering Theory (Grosz et al., 1995) and is an alternative to Brennan et al.'s 1987 algori thm. The advantages of this new model, the Left-Right Centering Algorithm (LRC), lie in its incremental processing of utterances and in its low computat ional overhead. The algorithm is compared with three other pronoun resolution methods: Hobbs' syntax-based algorithm, Strube's S-list approach, and the BFP Centering algorithm. All four methods were implemented in a system and tested on an annotated subset of the Treebank corpus consisting of 2026 pronouns. The noteworthy results were that Hobbs and LRC performed the best. 1 I n t r o d u c t i o n The aim of this project is to develop a pronoun resolution algorithm which performs better than the Brennan et al. 1987 algorithm 1 as a cognitive model while also performing well empirically. A revised algorithm (Left-Right Centering) was motivated by the fact that the BFP algori thm did not allow for incremental processing of an utterance and hence of its pronouns, and also by the fact that it occasionally imposes a high computational load, detracting from its psycholinguistic plausibility. A second motivation for the project is to remedy the dearth of empirical results on pronoun resolution methods. Many small comparisons of methods have been made, such as by Strube (1998) and Walker (1989), but those usually consist of statistics based on a small handtested corpus. The problem with evaluating 1Henceforth BFP algorithms by hand is that it is t ime consuming and difficult to process corpora that are large enough to provide reliable, broadly based statistics. By creating a system that can run algorithms, one can easily and quickly analyze large amounts of data and generate more reliable results. In this project, the new algorithm is tested against three leading syntax-based pronoun resolution methods: Hobbs' naive algor i thm (1977), S-list (Strube 1998), and BFP. Section 2 presents the motivation and algor i thm for Left-Right Centering. In Section 3, the results of the algorithms are presented and then discussed in Section 4. 2 Left-Right Centering A l g o r i t h m Left-Right Centering (LRC) is a formalized algorithm built upon centering theory's constraints and rules as detailed in Grosz et. al (1995). The creation of the LRC Algorithm is motivated by two drawbacks found in the BFP method. The first is BFP 's l imitation as a cognitive model since it makes no provision for incremental resolution of pronouns (Kehler 1997). Psycholinguistic research support the claim that listeners process utterances one word at a time, so when they hear a pronoun they will try to resolve it immediately. If new information comes into play which makes the resolution incorrect (such as a violation of binding constraints), the listener will go back and find a correct antecedent. This incremental resolution problem also motivates Strube's S-list approach. The second drawback to the BFP algorithm is the computational explosion of generating and filtering anchors. In utterances with two or more pronouns and a Cf-list with several candidate antecedents for each pronoun, thousands of anchors can easily be generated making for a time consuming filtering phase. An exam-
منابع مشابه
Cross-linguistic Influence at Syntax-pragmatics Interface: A Case of OPC in Persian
Recent research in the area of Second Language Acquisition has proposed that bilinguals and L2 learners show syntactic indeterminacy when syntactic properties interface with other cognitive domains. Most of the research in this area has focused on the pragmatic use of syntactic properties while the investigation of compliance with a grammatical rule at syntax-related interfaces has not received...
متن کاملTranslation of "It" in a Deep Syntax Framework
We present a novel approach to the translation of the English personal pronoun it to Czech. We conduct a linguistic analysis on how the distinct categories of it are usually mapped to their Czech counterparts. Armed with these observations, we design a discriminative translation model of it, which is then integrated into the TectoMT deep syntax MT framework. Features in the model take advantage...
متن کاملLarge Corpus-based Semantic Feature Extraction for Pronoun Coreference
Semantic information is a very important factor in coreference resolution. The combination of large corpora and ‘deep’ analysis procedures has made it possible to acquire a range of semantic information and apply it to this task. In this paper, we generate two statistically-based semantic features from a large corpus and measure their influence on pronoun coreference. One is contextual compatib...
متن کاملImprove Tree Kernel-Based Event Pronoun Resolution with Competitive Information
Event anaphora resolution plays a critical role in discourse analysis. This paper proposes a tree kernel-based framework for event pronoun resolution. In particular, a new tree expansion scheme is introduced to automatically determine a proper parse tree structure for event pronoun resolution by considering various kinds of competitive information related with the anaphor and the antecedent can...
متن کاملResolution of Difficult Pronouns Using the ROSS Method
A new natural language understanding method for disambiguation of difficult pronouns is described. Difficult pronouns are those pronouns for which a level of world or domain knowledge is needed in order to perform anaphoral or other types of resolution. Resolution of difficult pronouns may in some cases require a prior step involving the application of inference to a situation that is represent...
متن کامل